ggRandomForests: Exploring Random Forest Survival

نویسندگان

  • John Ehrlinger
  • Eugene H. Blackstone
چکیده

Random forest (Breiman 2001a) (RF) is a non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF is a robust, nonlinear technique that optimizes predictive accuracy by fitting an ensemble of trees to stabilize model estimates. Random survival forests (RSF) (Ishwaran and Kogalur 2007; Ishwaran, Kogalur, Blackstone, and Lauer 2008) are an extension of Breiman’s RF techniques allowing efficient non-parametric analysis of time to event data. The randomForestSRC package (Ishwaran and Kogalur 2014) is a unified treatment of Breiman’s random forest for survival, regression and classification problems. Predictive accuracy makes RF an attractive alternative to parametric models, though complexity and interpretability of the forest hinder wider application of the method. We introduce the ggRandomForests package, tools for visually understand random forest models grown in R (R Core Team 2014) with the randomForestSRC package. The ggRandomForests package is structured to extract intermediate data objects from randomForestSRC objects and generate figures using the ggplot2 (Wickham 2009) graphics package. This document is structured as a tutorial for building random forest for survival with the randomForestSRC package and using the ggRandomForests package for investigating how the forest is constructed. We analyse the Primary Biliary Cirrhosis (PBC) of the liver data from a clinical trial at the Mayo Clinic (Fleming and Harrington 1991). We demonstrate random forest variable selection using Variable Importance (VIMP) (Breiman 2001a) and Minimal Depth (Ishwaran, Kogalur, Gorodeski, Minn, and Lauer 2010), a property derived from the construction of each tree within the forest. We will also demonstrate the use of variable dependence and partial dependence plots (Friedman 2000) to aid in the interpretation of RSF results. We then examine variable interactions between covariates using conditional variable dependence plots. Our aim is to demonstrate the strength of using Random Forest methods for both prediction and information retrieval, specifically in time to event data settings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ggRandomForests: Survival with Random Forests

Random Forests (Breiman 2001) (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by fitting an ensemble of trees to stabilize model estimates. Random Forests for survival (Ishwaran and Kogalur 2007; Ishwaran, Kogalur, Blackstone, and Lauer 2008) ...

متن کامل

Comparison of Random Survival Forests for Competing Risks and Regression Models in Determining Mortality Risk Factors in Breast Cancer Patients in Mahdieh Center, Hamedan, Iran

Introduction: Breast cancer is one of the most common cancers among women worldwide. Patients with cancer may die due to disease progression or other types of events. These different event types are called competing risks. This study aimed to determine the factors affecting the survival of patients with breast cancer using three different approaches: cause-specific hazards regression, subdistri...

متن کامل

Random survival forests for high-dimensional data

Minimal depth is a dimensionless order statistic that measures the predictiveness of a variable in a survival tree. It can be used to select variables in high-dimensional problems using Random Survival Forests (RSF), a new extension of Breiman’s Random Forests (RF) to survival settings. We review this methodology and demonstrate its use in high-dimensional survival problems using a public domai...

متن کامل

Survival of Dialysis Patients Using Random Survival Forest Model in Low-Dimensional Data with Few-Events

Background:Dialysis is a process for eliminating extra uremic fluids of patients with chronic renal failure. The present study aimed to determine the variables that influence the survival of dialysis patients using random survival forest model (RSFM) in low-dimensional data with low events per variable (EPV). Methods:In this historical cohort study, infor...

متن کامل

Evaluation of risk factors of recurrence of hodgkin\'s lymphoma using random survival forest and comparison with cox regression model

Background: In many studies, Cox regression was used to assess the important factors that affect the survival of cancer patients based on demographic and clinical variables. The aim of this study was to determine the factors affecting the survival of patients with Hodgkin's lymphoma using the random survival forest (RSF) method and compare it with the Cox model. Methods: In this retrospective ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015